Apparatus and Method for Video Image Stitching

ABSTRACT

A method includes collecting a first image from a first video camera of a test pattern and a second image from a second video camera of the test pattern. The first image and the second image have an overlap region. The overlap region is evaluated to generate calibration parameters that accommodate for any vertical, horizontal or rotational misalignment between the first image and the second image. Calibration parameters are applied to video streams from the first video camera and the second video camera.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. Provisional Patent Application 61/566,269, filed Dec. 2, 2011, entitled “Panoramic video Camera System and Related Methods”, the contents of which are incorporated herein by reference.

FIELD OF THE INVENTION

This invention relates generally to video image processing. More particularly, this invention relates to panoramic video image stitching utilizing alignment and calibration parameters for dynamic video image compensation.

BACKGROUND OF THE INVENTION

Panoramic video feeds should be combined in a seamless manner. Existing techniques that endeavor to achieve this have high power requirements and excessive processing times, particularly in the context of mobile devices, which are constrained by these parameters.

SUMMARY OF THE INVENTION

A method includes collecting a first image from a first video camera of a test pattern and a second image from a second video camera of the test pattern. The first image and the second image have an overlap region. The overlap region is evaluated to generate calibration parameters that accommodate for any vertical, horizontal or rotational misalignment between the first image and the second image. Calibration parameters are applied to video streams from the first video camera and the second video camera.

BRIEF DESCRIPTION OF THE FIGURES

The invention is more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which:

FIG. 1 illustrates a calibration system utilized in accordance with an embodiment of the invention.

FIG. 2 illustrates processing operations associated with an embodiment of the invention.

FIGS. 3 and 4 illustrate image correction operations performed in accordance with an embodiment of the invention.

FIGS. 5 and 6 illustrate error correction operations performed in accordance with an embodiment of the invention.

Like reference numerals refer to corresponding parts throughout the several views of the drawings.

DETAILED DESCRIPTION OF THE INVENTION

An embodiment of the invention enables image stitching at high frame rates to produce an output video data stream created from more than one input video stream. Test and calibration processes using still photographs of charts and video capture of moving targets enable alignment of the cameras and determine the known overlap between cameras.

FIG. 1 illustrates a calibration technique utilized in accordance with an embodiment of the invention. A first camera 100 with a viewing angle 102 takes an image of a test pattern 104. Similarly, a second camera 106 with a viewing angle 108 takes an image of the test pattern 104. The viewing angles have an overlap region 110. The cameras 100 and 106 have fixed positions in mount 112. Each camera delivers image data to an image processor 114. The image processor 114 may be a computer with a central processor, graphics processing unit and/or an additional processor.

Still photographs are used to determine the fixed overlap area 110 from one image sensor to the adjacent image sensor based on alignment within the final system, or based on a standardized calibration fixture. Sharp lines of the test pattern 104 allow the image processor 144 to program the known overlap at the pixel level. That is, the image processor 114 accommodates for any vertical, horizontal or rotational misalignment. The test pattern 104 also allows the image processor 114 to determine any focus errors or areas of soft focus for each image sensor so that image correction processing can be applied later to produce uniformly sharp images.

The test pattern 104 may be in the form of a grid. In this case, the field of view of each image capture device will have areas where the image has some distortion caused by either the lens or some other irregularity in the system. To produce an output video stream from more than one video input, the images should appear to have no distortion or uniform distortion along the edge where the input streams are joined. By calculating a known distortion for each image capture device, distortion for individual cameras is corrected. Since resulting distortions for points in the image plane will be known for each image capture device, the distortion between images will be corrected by image processing to create an image with minimal distortion between camera data streams.

The test pattern 104 may also be in the form of a color chart and/or uniform grey chart. Such a test pattern allows the image processor 114 to analyze any potential differences in color accuracy, relative illumination, and relative uniformity between cameras. These differences are stored as correction factors utilized by the image processor 114 to reduce noticeable differences between image streams.

This calibration process allows for an output video stream to be assembled from multiple video streams with the viewer being unable to perceive any meaningful change in image quality throughout an entire field of view, including up to 360 degrees along multiple axes.

FIG. 2 illustrates processing operations associated with an embodiment of the invention. The image processor 114 simultaneously captures frames from the first camera 100 and the second camera 106, as shown with block 200. Stored known camera distortion parameters 202 are applied by the image processor 204. The frames may then be converted from rectangular coordinates to cylindrical or spherical coordinates 206. It is then determined whether alignment has been calculated 208. If not, alignment is calculated 210 and the alignment calculation is stored 212 and processing proceeds to block 214. At block 214, adjacent frames are stitched and blended based on the alignment calculation 214. Adjacent frames are then applied as output 216 to a display.

Thus, through the calibration process, camera distortion and alignment parameters are obtained. These parameters are subsequently utilized as additional frames are received.

The following is an example of lens distortion correction and perspective correction applied to an image in accordance with an embodiment of the invention. Consider the following parameters.

-   -   Image Size: 1280 px×720 px     -   Image Center: (640, 360)     -   Distortion Size: 1 (Scaling factor, 0-1)     -   Distortion: 0.03 (Distortion correction polynomial coefficient)     -   Distant Param: 103.00 px (Focal length in pixels)

The following code implements operations of FIG. 2.

vec2 tc = gl_TexCoord[0].st; // Lens Distortion Correction vec2 P = (tc / imageCenter) − 1.; // to normalized image coordinates P /= distortionSize; vec2 Pp = P / (1. − distortion * dot(P, P)); P *= distortionSize; tc = (Pp + 1.) * imageCenter; // back to pixel coordinates // Cartesian coordinates tc −= vec2( imageCenter.x, imageCenter.y ); // Sphere(FishEye)-to-Erectangular float phi = tc.s / distanceparam; float theta = −tc.t / distanceparam + (M_PI/2.); if (theta < 0.0) { theta = −theta; phi += M_PI; } if (theta > M_PI) { theta = M_PI − (theta − M_PI); phi += M_PI ; } float s = sin(theta); vec2 v = vec2(s * sin(phi), cos(theta)); float r = length(v); theta = distanceparam * atan(r, s * cos(phi)); tc = v * (theta / r); // Erectangular-to-Cylindrical tc.t = distanceparam * atan(tc.t / distanceparam ); //Pixel coordinates tc += vec2( imageCenter.x − 0.5, imageCenter.y − 0.5);

In one embodiment, the process of automated alignment uses processing techniques implemented in OpenCV to determine the offset between two images that have overlapping regions. For example, feature matching and homography calculations may be used.

In one embodiment, manual alignment may be used to manually position image frames in a panoramic frame until the overlapping regions are seamless. When dealing with a 360-degree panoramic frame, the beginning and end of the frame need to be aligned and stitched together. To accomplish this there are three parameters that need to be calibrated.

-   -   Overlap—The amount of pixel overlap between the beginning and         end of the frame.     -   Twist—The amount of vertical error between the beginning and end         of the frame.     -   Blend Width—The width of the blending area across the         overlapping region.

Those skilled in the art will appreciate that the invention may be used in the production of video camera systems that create ultra wide-angle video streams or panoramic video. The invention can also be used post-production to address camera performance issues in the field. If cameras are knocked out of alignment during use, the calibration and alignment steps can re-calibrate the system, preventing the unnecessary replacement of certain parts or the whole system.

This invention can also be used when creating camera systems in multiple axes. For example, if camera systems are stacked along a vertical axis to create a “tall” cylindrical video stream, the alignment and calibration process can be used to ensure that camera performance is set within a pre-defined standard so there is not a wide variation in image performance between the separate video camera systems.

The invention also allows multiple cameras to be calibrated within a specific performance range. If there are multiple camera systems being used at the same event, it is critical that they all have image performance within a defined range to reduce variation in image performance/quality between camera systems. The invention allows for calibration across a group of panoramic or wide-angle camera systems to ensure video outputs appear consistent to the end user.

A reduction in bandwidth can be achieved by calibrating the Lens Distortion Parameter and the Camera Field of View (FOV) for each camera in the system. When the system processes each camera's video stream frame by frame it will use the Lens Distortion Parameter and Camera FOV to correct the inherent lens distortion/perspective in each frame. Correcting the video frames with these parameters warps the pixels to a certain degree, causing the frame to be cropped in order to get rid of the warping affects. For example, FIG. 3 illustrates distortion 300 between individual images 302 and 304. As shown in FIG. 4, cropping lines 400 and 402 may be used to compensate for this warping affect and provide a more uniform viewing area that is pleasing to the eye.

The image processor 114 may execute software with calibrated parameters that reduce the overall time to stitch camera frames together. These parameters may include FOV overlap, image alignment, individual camera illumination correction and system relative illumination.

Calibrating the image alignment and exposure values for the system can reduce the overall per-frame system processing time by approximately ˜400 ms and ˜450 ms respectively. This improvement is critical for live viewing applications, or applications when the camera may be turned on and off frequently.

Image alignment is the parameter that defines the offset between each camera in order to stitch and produce a complete 360° panoramic video. Each camera's exposure settings should be set to be the same and change at the same rate to eliminate under-exposed and over-exposed areas in the 360-degree image. Dynamic range software can help compensate for cameras that are under or over exposed during difficult lighting conditions when the system is set to a single exposure level.

The alignment and calibration techniques may be used for initial system set up and initial video streaming. Further, they may be used once the system is operative and is otherwise streaming video. In such a context, real world problems occur, such as camera faults and subsequent camera displacement or offset. Techniques of the invention address these issues.

FIG. 5 illustrates processing associated with one such technique. The image processor 114 streams video from multiple cameras 500. The image processor 114 is configured to check for a faulty camera 502 (e.g., a lost signal). If a faulty camera exists, the image processor 114 maximizes the field of view of adjacent cameras 504. As shown in FIG. 1, an overlapping region 110 exists between cameras. So for example, if camera 100 is faulty, the full field of view 108 of camera 106 would be utilized.

FIG. 6 illustrates processing associated with a camera misalignment after initial alignment and calibration. The image processor 114 streams video from multiple cameras 600. The image processor 114 is configured to check for an offset camera 602 (e.g., a camera misaligned from its original position). If an offset camera is identified, the offset is evaluated 604. For example, the displacement between matching pixels of adjacent video streams may be evaluated. Also, outside sensors may be used to determine this displacement, such as accelerometers or digital gyroscopes. This displacement is then used to form updated alignment parameters (e.g., the original alignment parameters are compensated to incorporate the identified displacement) 606. Video streams are subsequently stitched based on the updated alignment parameters 608.

An embodiment of the present invention relates to a computer storage product with a computer readable storage medium having computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs, DVDs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment of the invention may be implemented using JAVA®, C++, or other object-oriented programming language and development tools. Another embodiment of the invention may be implemented in hardwired circuitry in place of, or in combination with, machine-executable software instructions.

The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously, many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, they thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the following claims and their equivalents define the scope of the invention. 

1. A method, comprising: collecting a first image from a first video camera of a test pattern and a second image from a second video camera of the test pattern, wherein the first image and the second image have an overlap region; evaluating the overlap region to generate calibration parameters that accommodate for any vertical, horizontal or rotational misalignment between the first image and the second image; and applying the calibration parameters to video streams from the first video camera and the second video camera.
 2. The method of claim 1 further comprising collecting camera distortion values from the first video camera and the second video camera to generate calibration parameters.
 3. The method of claim 1 further comprising collecting alignment parameter between the first video camera and the second video camera to generate calibration parameters.
 4. The method of claim 1 further comprising collecting overlap values between the first video camera and the second video camera to generate calibration parameters.
 5. The method of claim 1 further comprising collecting twist parameters between the first video camera and the second video camera to generate calibration parameters.
 6. The method of claim 1 further comprising collecting blend width parameters between the first video camera and the second video camera to generate calibration parameters.
 7. The method of claim 1 further comprising collecting field of view overlap parameters between the first video camera and the second video camera to generate calibration parameters.
 8. The method of claim 1 further comprising collecting individual camera illumination parameters from the first video camera and the second video camera to generate calibration parameters.
 9. The method of claim 1 further comprising collecting system relative illumination parameters from the first video camera and the second video camera to generate calibration parameters.
 10. The method of claim 1 further comprising applying cropping lines to the video streams to correct for warping.
 11. The method of claim 1 further comprising: identifying a faulty video camera in a stream of video from multiple video cameras; and maximizing the field of view of video cameras adjacent to the faulty video camera.
 12. The method of claim 1 further comprising: identifying an offset video camera in a stream of video from multiple video cameras; evaluating the offset; forming updated alignment parameters; and combining video based upon the updated alignment parameters. 